Spatial Analyses

Chapter 12

As in previous weeks, for today’s lab you are to create a new R markdown document using the class lab report template. Within your .Rmd document, include all you code, resulting plots, and answers to questions below.

When you are done with your report, use knitr to convert it to .PDF format to submit on Canvas. It is important that you document each step of your workflow using comments and that you break up the sections of your analysis into SEPARATE code chunks.

Today’s investigation. In this chapter you will learn about Light Detection and Ranging (LiDAR) data. You will learn to use point cloud data and lidar rasters in R and explore using QGIS - a free, open-source GIS tool. You will also explore the concept of uncertainty surrounding lidar raster data (and remote sensing data in general).


Intro to Lidar Data

Lidar is an active remote sensing technique that measures vegetation height. Learn more about discrete and full waveform LIDAR and how to use LIDAR data.

After completing this tutorial, you will be able to:

Lidar or Light Detection and Ranging is an active remote sensing system that can be used to measure vegetation height across wide areas.

Lidar data collected at the Soaproot Saddle site by the National Ecological Observatory Network Airborne Observation Platform (NEON AOP). Source: Keith Krauss, NEON.
Lidar data collected at the Soaproot Saddle site by the National Ecological Observatory Network Airborne Observation Platform (NEON AOP). Source: Keith Krauss, NEON.

Lidar Background

Watch the videos below to better understand what lidar is and how a lidar system works.

The Story of Lidar Data

How Lidar Works

Getting Started

Why Lidar

Scientists often need to characterize vegetation over large regions. We use tools that can estimate key characteristics over large areas because we don’t have the resources to measure each individual tree. These tools often use remote methods. Remote sensing means that we aren’t actually physically measuring things with our hands, we are using sensors which capture information about a landscape and record things that we can use to estimate conditions and characteristics.

Conventional on the ground methods to measure trees are resource intensive and limit the amount of vegetation that can be characterized. Source: National Geographic. To measure vegetation across large areas we need remote sensing methods that can collect many measurements quickly using automated sensors. These measurements can be used to estimate forest structure across larger areas. Lidar (also sometimes referred to as active laser scanning) is one remote sensing method that can be used to map structure including vegetation height, density and other characteristics across a region. Lidar directly measures the height and density of vegetation (as well as buildings and other objects) on the ground making it an ideal tool for scientists studying vegetation over large areas.

LEFT: Remote sensing systems that measure energy that is naturally available are called passive sensors. RIGHT: Active sensors emit their own energy from a source on the instrument itself. Source: Natural Resources Canada.
LEFT: Remote sensing systems that measure energy that is naturally available are called passive sensors. RIGHT: Active sensors emit their own energy from a source on the instrument itself. Source: Natural Resources Canada.

Lidar is an Active Remote Sensing System

Lidar is an active remote sensing system. An active system means that the system itself generates energy - in this case light - to measure things on the ground. In a lidar system, light is emitted from a rapidly firing laser. This light travels to the ground and reflects off of things like buildings and tree branches. The reflected light energy then returns to the lidar sensor where it is recorded.

A lidar system measures the time it takes for emitted light to travel to the ground and back. That time is used to calculate distance traveled. Distance traveled is then converted to elevation. These measurements are made using the key components of a lidar system including a GPS that identifies the X,Y,Z location of the light energy and an Internal Measurement Unit (IMU) that provides the orientation of the plane in the sky.

How Light Energy is Used to Measure Trees

Light energy is a collection of photons. As the photons that make up light move towards the ground, they hit objects such as branches on a tree. Some of the light reflects off of those objects and returns to the sensor. If the object is small and there are gaps surrounding it that allow light to pass through, some light continues down towards the ground. Because some photons reflect off of things like branches but others continue down towards the ground, multiple reflections may be recorded from one pulse of light.

The distribution of energy that returns to the sensor creates what we call a waveform. The amount of energy that returned to the lidar sensor is known as “intensity”. The areas where more photons or more light energy returns to the sensor create peaks in the distribution of energy. Theses peaks in the waveform often represent objects on the ground like a branch, a group of leaves or a building.

LEFT: Remote sensing systems that measure energy that is naturally available are called passive sensors. RIGHT: Active sensors emit their own energy from a source on the instrument itself. Source: Natural Resources Canada.
LEFT: Remote sensing systems that measure energy that is naturally available are called passive sensors. RIGHT: Active sensors emit their own energy from a source on the instrument itself. Source: Natural Resources Canada.

How Scientists Use Lidar Data

There are many different uses for lidar data:

Lidar data have historically been used to generate high resolution elevation datasets. Source: National Ecological Observatory Network.
Lidar data have historically been used to generate high resolution elevation datasets. Source: National Ecological Observatory Network.

Lidar data have also been used to derive information about vegetation structure including:

Cross section showing lidar point cloud data (above) and the corresponding landscape profile (below). Graphic: Leah A. Wasser.
Cross section showing lidar point cloud data (above) and the corresponding landscape profile (below). Graphic: Leah A. Wasser.

Discrete vs. Full Waveform Lidar

A waveform or distribution of light energy is what returns to the lidar sensor. This return may be recorded in two different ways:

Lidar File Formats

Whether they are collected as discrete points or full waveform, most often lidar data are available as discrete points. A collection of discrete return lidar points is known as a lidar point cloud.

The commonly used file format to store lidar point cloud data is the .las format. The .laz format is a highly compressed version of .las and is becoming more widely used.

Lidar Data Attributes: X, Y, Z, Intensity and Classification

Lidar data attributes can vary, depending upon how the data were collected and processed. You can determine what attributes are available for each lidar point by looking at the metadata.

All lidar data points will have:

  • X,Y Location Information: determines the x,y coordinate location of the object that the lidar pulse (the light) reflected off of

  • Z (elevation values): represents the elevation of the object that the lidar pulse reflected off of

Most lidar data points will have:

  • Intensity: represents the amount of light energy recorded by the sensor.

Classified Lidar Point Clouds

Some lidar point cloud data will also be “classified”. Classification refers to tagging each point with the object that it reflected off of. So if a pulse reflects off of a tree branch, you would assign it to the class “vegetation.” If the pulse reflects off of the ground, you would assign it to the class “ground.” Classification of lidar point clouds is an additional processing step. Classification simply represents the type of object that the laser return reflected off of. So if the light energy reflected off of a tree, it might be classified as “vegetation”. And if it reflected off of the ground, it might be classified as “ground”.

Some lidar products will be classified as “ground/non-ground.” Some datasets will be further processed to determine which points reflected off of buildings and other infrastructure. Some lidar data will be classified according to the vegetation type.

What is a Data Product?

A data product is the data that are DERIVED from an instrument, or information collected on the ground. For instance, you may go out in the field and measure the heights of trees at 20 plots then calculate an average height per plot. The average value is DERIVED from the individual measurements that you collected in the field.

When dealing with sensor data, the sensors often collect data in a format that needs to be processed in order to get usable values from it.

A cross section of 3-dimensional lidar data. This point cloud data product is classified into the classes: vegetation and ground points. Brown points represent ground, green represent vegetation (trees).
A cross section of 3-dimensional lidar data. This point cloud data product is classified into the classes: vegetation and ground points. Brown points represent ground, green represent vegetation (trees).

Lidar Point Clouds]

The point cloud is one of the commonly found lidar data products and is the “native” format for discrete return lidar data. In this lesson you will explore some point cloud data using the plas.io viewer.

Explore Lidar Points in plas.io

Lidar data collected over Grand Mesa, Colorado. Source: NEON Airborne Observation Platform (AOP).
Lidar data collected over Grand Mesa, Colorado. Source: NEON Airborne Observation Platform (AOP).

In this activity, you will open a .las file, in the plas.io free online lidar data viewer. You will then explore some of the attributes associated with a lidar data point cloud.

Lidar Attribute Data

Remember that not all lidar data are created equally. Different lidar data may have different attributes. In this activity, you will look at data that contain both intensity values and a ground vs non ground classification.

About plas.io: Plasio is a project by Uday Verma and Howard Butler that implements point cloud rendering capability in a browser. Specifically, it provides a functional implementation of the ASPRS LAS format, and it can consume LASzip-compressed data using LASzip NaCl module.

1. Open a .las File in plas.io

  • If you haven’t already, download the week 3 dataset - linked at the top of this page. It contains several .laz format point cloud datasets that you will use in this lesson.
  • When the download is complete, drag one of the .laz files into the plas.io website. window.
  • Zoom and pan around the data.
  • Use the particle size slider to adjust the size of each individual lidar point. NOTE: the particle size slider is located a little more than half way down the plas.io toolbar in the “Data” section If the data imported into the plas.io viewer correctly, you should see something similar to the screenshot below:
You can drag a .las or .laz dataset into the plas.io viewer to view the data in your browser!
You can drag a .las or .laz dataset into the plas.io viewer to view the data in your browser!

How the Points are Colored - Why is Everything Grey When the Data are Loaded?

Notice that the data, upon initial view, are colored in a black - white color scheme. These colors represent the data’s intensity values. Remember that the intensity value for each lidar point represents the amount of light energy that reflected off of an object and returned to the sensor. In this case, darker colors represent LESS light energy returned. Lighter colors represent MORE light returned.

Lidar intensity values represent the amount of light energy that reflected off of an object and returned to the sensor.
Lidar intensity values represent the amount of light energy that reflected off of an object and returned to the sensor.

2. Adjust the Intensity Threshold

Next, scroll down through the tools in plas.io. Look for the Intensity Scaling slider. The intensity scaling slider allows you to define the thresholds of light to dark intensity values displayed in the image (similar to stretching values in an image processing software or even in photoshop).

Drag the slider back and forth. Notice that you can brighten up the data using the slider.

The intensity scaling slider is located below the color map tool so it’s easy to miss. Drag the slider back and forth to adjust the range of intensity values and to brighten up the lidar point clouds.
The intensity scaling slider is located below the color map tool so it’s easy to miss. Drag the slider back and forth to adjust the range of intensity values and to brighten up the lidar point clouds.

3. Change the Lidar Point Cloud Color Options to Classification

In addition to intensity values, these lidar data also have a classification value. Lidar data classification values are numeric, ranging from 0-20 or higher. Some common classes include:

  • 0 Not classified
  • 1 Unassigned
  • 2 Ground
  • 3 Low vegetation
  • 4 Medium vegetation
  • 5 High vegetation
  • 6 Building
Blue and Orange gradient color scheme submitted by Kendra Sand. Which color scheme is your favorite?
Blue and Orange gradient color scheme submitted by Kendra Sand. Which color scheme is your favorite?

In this case, these data are classified as either ground, or non-ground. To view the points, colored by class:

  • Change the “colorization” setting to “Classification””
  • Change the intensity blending slider to “All Color”
  • For kicks - play with the various colormap options to change the colors of the points

Gridded or Raster Lidar Data Products

Point clouds provide a lot of information, scientifically. However, they can be difficult to work with given the size of the data and tools that are available to handle large volumns of points. Lidar data products are often created and stored in a gridded or raster data format. The raster format can be easier for many people to work with and also is supported by many different commonly used software packages.

LEFT: Lidar data points overlayed on top of a hillshade which represents elevationin a graphical 3-dimensional view. RIGHT: If you zoom in on a portion of the data, you will see that the elevation data consists of cells or pixels and there are lidar data points that fall within most of the pixels.
LEFT: Lidar data points overlayed on top of a hillshade which represents elevationin a graphical 3-dimensional view. RIGHT: If you zoom in on a portion of the data, you will see that the elevation data consists of cells or pixels and there are lidar data points that fall within most of the pixels.

What is a Raster?

aster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. A raster file is a composed of regular grid of cells, all of which are the same size. You’ve looked at and used rasters before if you’ve looked at photographs or imagery in a tool like Google Earth. However, the raster files that you will work with are different from photographs in that they are spatially referenced. Each pixel represents an area of land on the ground. That area is defined by the spatial resolution of the raster.

A raster is composed of a regular grid of cells. Each cell is the same size in the x and y direction. Source: Colin Williams, NEON.
A raster is composed of a regular grid of cells. Each cell is the same size in the x and y direction. Source: Colin Williams, NEON.

Raster Facts

A few notes about rasters:

  • Each cell is called a pixel.
  • Each pixel represents an area on the ground.
  • The resolution of the raster is the area that each pixel represents on the ground. So a 1-meter resolution raster means that each pixel represents a 1m by 1m area on the ground.

A raster dataset can have attributes associated with it as well. For instance in a lidar derived digital elevation model (DEM), each cell represents an elevation value for that location on the earth. In a lidar derived intensity image, each cell represents a lidar intensity value or the amount of light energy returned to and recorded by the sensor.

Rasters can be stored at different resolutions. The resolution simply represents the size of each pixel cell. Source: Colin Williams, NEON.
Rasters can be stored at different resolutions. The resolution simply represents the size of each pixel cell. Source: Colin Williams, NEON.

Creating a Raster from Lidar Point Clouds

Point to Raster Methods - Gridding

There are different ways to create a raster from lidar point clouds. Let’s look at one of the most basic ways to create a raster file points: gridding. When you grid raster data, you calculate a value for each pixel or cell in your raster dataset using the points that are spatially located within that cell. To do this:

  • A grid is placed on top of the lidar data in geographic space. Each cell in the grid has the same spatial dimensions. These dimensions represent that particular area on the ground. If you want to derive a 1m resolution raster from the lidar data, you overlay a 1m by 1m grid over the lidar data points.
  • Within each 1m x 1m cell, you calculate a value to be applied to that cell, using the lidar points found within that cell. The simplest method of doing this is to take the max, min or mean height value of all lidar points found within the 1m cell. If you use this approach, you might have cells in the raster that don’t contain any lidar points. These cells will have a “no data” value if you process your raster in this way.

Point to Raster Methods - Interpolation

A different approach is to interpolate the value for each cell. Interpolation considers the values of points outside of the cell in addition to points within the cell to calculate a value. Interpolation also often uses statistical operations (math) to calculate the cell value. Interpolation is useful because it can provide us with some ability to predict or calculate cell values in areas where there are no data (or no points). And to quantify the error associated with those predictions which is useful to know, if you are doing research. We won’t be using interpolation in this lab.

Materials and Methods]

To work with raster data in R, you can use the raster and sf packages.

# load libraries
library(raster)
library(sf)
library(ggplot2)

You use the raster(“path-to-raster-here”) function to open a raster dataset in R. Note that you use the plot() function to plot the data. The function argument main = “” adds a title to the plot.

# open raster data
lidar_dem <- raster(x = "BLDR_LeeHill/pre-flood/lidar/pre_DTM.tif")

# plot raster data
plot(lidar_dem,
     main = "Digital Elevation Model - Pre 2013 Flood")

If you zoom in on a small section of the raster, you can see the individual pixels that make up the raster. Each pixel has one value associated with it. In this case that value represents the elevation of ground.

Note that you are using the xlim= argument to zoom in to on region of the raster. You can use xlim and ylim to define the x and y axis extents for any plot.

# zoom in to one region of the raster
plot(lidar_dem,
  xlim = c(473000, 473030), # define the x limits
  ylim = c(4434000, 4434030), # define y limits for the plot
     main = "Lidar Raster - Zoomed into one small region")

Next, let’s discuss some of the important spatial attributes associated with raster data.

Coordinate Reference System]

The coordinate reference system (CRS) of a spatial object tells R where the raster is located in geographic space. It also tells R what method should be used to “flatten” or project the raster in geographic space.

Maps of the United States in different projections. Notice the differences in shape associated with each different projection. These differences are a direct result of the calculations used to “flatten” the data onto a 2-dimensional map. Source: M. Corey, opennews.org
Maps of the United States in different projections. Notice the differences in shape associated with each different projection. These differences are a direct result of the calculations used to “flatten” the data onto a 2-dimensional map. Source: M. Corey, opennews.org

What Makes Spatial Data Line Up on a Map?

You will learn CRS in more detail in next weeks class. For this week, just remember that data from the same location but saved in different coordinate references systems will not line up in any GIS or other program. Thus, it’s important when working with spatial data in a program like R to identify the coordinate reference system applied to the data and retain it throughout data processing and analysis.

View Raster CRS in R

You can view the CRS string associated with your R object using the crs() method. You can assign this string to an R object too.

# view resolution units
crs(lidar_dem)
## CRS arguments:
##  +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0

# assign crs to an object (class) to use for reprojection and other tasks
myCRS <- crs(lidar_dem)
myCRS
## CRS arguments:
##  +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0

The CRS string for our lidar_dem object tells us that your data are in the UTM projection.

~The UTM zones across the continental United States. Source: Chrismurf, wikimedia.org.

The CRS format, returned by R, is in a PROJ 4 format. This means that the projection information is strung together as a series of text elements, each of which begins with a + sign.

+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0

You’ll focus on the first few components of the CRS in this activity:

  • +proj=utm The projection of the dataset. Your data are in Universal Transverse Mercator (UTM).
  • +zone=18 The UTM projection divides up the world into zones, this element tells you which zone the data is in. Harvard Forest is in Zone 18.
  • +datum=WGS84 The datum was used to define the center point of the projection. Your raster uses the WGS84 datum.
  • +units=m This is the horizontal units that the data are in. Your units are meters.

Important: You are working with lidar data which has a Z or vertical value as well. While the horizontal units often match the vertical units of a raster they don’t always! Be sure to check the metadata of your data to figure out the vertical units!

Spatial Extent

Great Work!

This lab activity was originally written by [https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/](https://wwmodified by Jenna Ekwealor.